library(tidyverse)
library(dplyr)
library(janitor)
library(here)
library(wordcloud)
library(RColorBrewer)
library(patchwork)
library(ggwordcloud)
library(paletteer)

# figure out how to hide the text output and keep the code, ideally allowing the reader to open and close the code in the knitted document through some variation of: include = FALSE, collapse = TRUE, class.source = 'fold-hide', results = 'hide'

It’s been a week since we finished our six week, five class, jam-packed, inaugural summer intensive quarter of UCSB’s Bren School of Environmental Science & Management Master’s of Environmental Data Science. We spent the week partly relaxing after weeks of coding boot camp and partly developing this blog to prove that we learned quite a bit :)

As we reflected on the first quarter of our degree, we decided it may be more interesting to show, rather than tell, some of the skills we’ve learned. We also thought it might be fun to share reflections from our whole cohort to truly represent the student experience of this fast-paced, learning-filled summer.

Survey Development

As such, we developed a survey (in Google forms) to send to our classmates and gather data about their perspectives on these first six weeks. We wondered about two main questions: 1.) what did the cohort think of our classes? and 2.) what kinds of fun things has the group been up to in our delightful home of Santa Barbara?

We developed a Google form for our peers to give feedback about these wondering. Our questions were developed with data tidying and visualization in mind, and when we got some unexpected answers, we developed perspective about how our survey could have been better to help us with a bit less wrangling. MEDS students, however, do not shy away from problematic data, so we persisted with the problems that arose :)

Data Visualizations

The following is a description of how we visualizes the most interesting survey data submitted by our peers, including neat graphics describing our MEDS summer, from tidying data to surfing after class. We hope you find it insightful, but also fun :)

Our Data

First, we read the data from our Google form survey, which we were able to download as a .csv file, into the R project we made for this blog project.

data <- read_csv(here("data", "MEDS Summer Reflection Survey (Responses) - Responses Clean.csv"))

Data Tidying

Next, we renamed the columns. Google forms does this frustrating, but understandable thing where the names of the columns of data are the questions we asked participants. This makes for super long names that are not fun to write code with. We instead named the columns of interest with the corresponding order of our coursework, i.e. column 1 is our first course, EDS 212.

#colnames(data) renames the columns
data_clean <- data %>% 
  rename("1" =  "Write 3 words to describe or represent week 1 (EDS212 w/ Allison) here:") %>% 
  rename("2_3" =  "Write 3 words to describe or represent weeks 2 & 3 (EDS221 w/ Allison) here:") %>% 
  rename("4" =  "Write 3 words to describe or represent week 4 (EDS214 w/ Julien) here:") %>% 
  rename("5" =  "Write 3 words to describe or represent week 5 (EDS215 w/ Frew) here:") %>% 
  rename("6" =  "Write 3 words to describe or represent week 1 (EDS216 w/ Scott) here:")

Data wrangling

The Google form format included five questions (one for each summer course) where students wrote in three words to describe how they felt about the course. In the .csv of the survey data, the three words submitted by a participant were grouped together in one cell per course.

To visualize how often certain descriptors for each summer course appear, we first needed to separate the three terms submitted for each course into separate observations. We executed this using the separate_rows() function. This expanded the terms into three separate observations per student in each class column.

# separate the columns into rows by parsing the 3 words in each observation into 3 different observations
# select certain cols because our first data viz is only using certain cols 
data_clean_1 <- data_clean %>% 
  separate_rows("1") %>% 
  select("Email Address","1")

data_clean_2_3 <- data_clean %>% 
  separate_rows("2_3")%>% 
  select("Email Address","2_3")

data_clean_4 <- data_clean %>% 
  separate_rows("4") %>% 
  select("Email Address","4")

data_clean_5 <- data_clean %>% 
  separate_rows("5") %>% 
  select("Email Address","5")

data_clean_6 <- data_clean %>% 
  separate_rows("6") %>% 
  select("Email Address","6")

Then, to observe the frequency of words used to describe each course, we used the table function. This creates a table that has two columns: one for each distint descriptive word students used and the frequency with which the words occur for the class. Then, we converted that table to a data frame.

# use the table() function to take counts of each "factor" (words) and use the data.frame() function to convert these tables to data frames

course_1 <- data.frame(table(data_clean_1$"1"))

course_1_df <- as.data.frame.matrix(course_1)

course_2_3 <- data.frame(table(data_clean_2_3$"2_3"))

course_2_3_df <- as.data.frame.matrix(course_2_3)

course_4 <- data.frame(table(data_clean_4$"4"))

course_4_df <- as.data.frame.matrix(course_4)

course_5 <- data.frame(table(data_clean_5$"5"))

course_5_df <- as.data.frame.matrix(course_5) %>% 
  filter(!Var1 == "tangent") %>% 
  filter(!Var1 == "tangents") %>% 
  filter(!Var1 == "dry")

course_6 <- data.frame(table(data_clean_6$"6"))

course_6_df <- as.data.frame.matrix(course_6)

Data Visualizing

After this, we wanted to visualize the frequency of words for each class. We used ggplot to create word clouds which represented the frequency of class descriptors. The word clouds display the descriptive words submitted by our classmates in sizes proportional to the frequency the words were used. The ggplot plot we used is called ggwordcloud. We updated the colors by adding an aesthetic feature where each class descriptor is represented by a different color. We also updated the word size so that the cloud was easier to read.

Since we used ggplot quite a bit in EDS 221, we opted for this method for generating word clouds over another package called wordcloud, which is specifically for word clouds. We found that we wanted to showcase our visualization skills by stacking visualizations and adding titles and colors, and it was easier to do this in a ggplot version of word clouds, since we had so much practice with other plots in ggplot.

The code chunk below is where we tried the wordcloud package. It made the correct visualizations, but we found it more difficult to make them as nice as some of the ggplots we made in class. It was a relief to learn that there is a word cloud feature in ggplot!

And finally, the ggplot word clouds!

# for each cloud, specify the background color within the theme to match the background color of the blog

cloud_1 <- ggplot(course_1_df, aes(label = Var1, size = Freq, color = Var1)) +
  geom_text_wordcloud() +
  scale_size_area(max_size = 25) +
  theme(plot.title = element_text(size = 25),
        panel.background = element_rect(fill = "white")) +
  labs(title = "Week 1 EDS 212: Essential Math in Environmental Data Science")

cloud_1
cloud_2_3 <- ggplot(course_2_3_df, aes(label = Var1, size = Freq, color = Var1)) +
  geom_text_wordcloud() +
  scale_size_area(max_size = 25) +
  theme(plot.title = element_text(size = 25),
        panel.background = element_rect(fill = "white")) +
  labs(title = "Weeks 2 & 3 EDS 221: Scientific Programming Essentials")

cloud_2_3
cloud_4 <- ggplot(course_4_df, aes(label = Var1, size = Freq, color = Var1)) +
  geom_text_wordcloud() +
  scale_size_area(max_size = 25) +
  theme(plot.title = element_text(size = 25),
        panel.background = element_rect(fill = "white")) +
  labs(title = "Week 4 EDS 214: Analytical Workflows and Scientific Reproducibility")

cloud_4
cloud_5 <- ggplot(course_5_df, aes(label = Var1, size = Freq, color = Var1)) +
  geom_text_wordcloud() +
  scale_size_area(max_size = 25) +
  theme(plot.title = element_text(size = 25),
        panel.background = element_rect(fill = "white")) +
  labs(title = "Week 5 EDS 215: Introduction to Data Storage and Management")

cloud_5
cloud_6 <- ggplot(course_6_df, aes(label = Var1, size = Freq, color = Var1)) +
  geom_text_wordcloud() +
  scale_size_area(max_size = 25) +
  theme(plot.title = element_text(size = 25),
        panel.background = element_rect(fill = "white")) +
  labs(title = "Week 6 EDS 216: Meta-Analysis and Systematic Reviews")

cloud_6
# use patchwork to stack the graphs

 cloud_1 / cloud_2_3 / cloud_4 / cloud_5 / cloud_6

The MEDS 2022 cohort gathered at NCEAS downtown after class during summer session. The MEDS 2022 cohort gathered at NCEAS downtown after class during summer session.

Members of the MEDS cohort in downtown Santa Barbara celebrating completing the first half of summer session classes with faculty and their pets. Members of the MEDS cohort in downtown Santa Barbara celebrating completing the first half of summer session classes with faculty and their pets.

# download cvs for SB activities
data_activities <- read_csv(here("data", "sb_activities_data.csv"))

# make data.frame for SB activities histogram


activities_clean <- data.frame(table(data_activities$"sb_activities"))

#SB_activities <- ggplot(activities_clean, aes(x = Var1, y = Freq)) +
#  geom_bar() +
#  coord_flip() +
#  labs(title = "MEDS Favorite Santa Barbara Activities",
#       y = "Activity")

SB_activities <- ggplot(activities_clean, aes(y = reorder(Var1, +Freq), x = Freq)) +
  geom_histogram(stat = 'identity', aes(fill = Var1, color = "blue")) +
  scale_fill_paletteer_d("dutchmasters::milkmaid") +
  theme(legend.position = "none",
        panel.grid = element_blank(),
        panel.background = element_rect(fill = "white")) +
  labs(title = "MEDS Favorite Santa Barbara Activities",
       y = "Activity",
       x = "Total Votes")

SB_activities

Histogram representing the MEDS students' favorite activities in Santa Barbara. The most popular activities are surfing, going to the beach, and biking.

#panel.grid = element_blank()